-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Correct query times for model plot and forecast #327
[ML] Correct query times for model plot and forecast #327
Conversation
lib/api/CForecastRunner.cc
Outdated
core_t::TTime bucketLength{model.s_ForecastModel->params().bucketLength()}; | ||
core_t::TTime startTime{model_t::sampleTime( | ||
feature, forecastJob.s_StartTime, bucketLength)}; | ||
core_t::TTime endTime{model_t::sampleTime( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to fix it in CAnomalyJob::doForecast
instead? CForecastRunner
is just a dumb worker and should not have any important logic. CAnomalyJob::doForecast
calls into the runner and sets startTime
to m_LastResultsTime
, it seems to me, that adjusting it there does the same thing but is a bit cleaner. endTime
is anyway just relative to startTime.
Maybe the same can be done for model plots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is this is feature specific. So it is tricky to push it higher up if the forecast is being run over a job with multiple detectors with different features.
I could create a wrapper which implements the logic in the model library. I can't directly push the feature into the forecast function (because it is in the maths library which can't depend on EFeature). I could supply a call back to compute the offset start and end times and have this use the wrapper from the model library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, how about I add a function to actually run the forecast to model_t which wraps up this detail. Given we only have the maths::CTimeSeriesModel here (for good reason) this seems like it might be the cleanest option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is this is feature specific. So it is tricky to push it higher up if the forecast is being run over a job with multiple detectors with different features.
ok, I see and agree that's to complicated.
What about inside of model.s_ForecastModel->forecast(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That hits the library dependency issue mentioned above. However, what about if I have a
CForecastDataSink::SForecastModelWrapper::forecast
function which takes the forecast job. This could wrap all the functionality now in this loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, I am also ok if we keep the current version given that alternatives are to complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I like the idea of wrapping this in SForecastModelWrapper. It seems more natural to me than in this loop which is really just about scheduling. I'll make it and see how it looks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f980f26. Note that none of the members of SForecastModelWrapper
are needed outside of the new forecast function, so I converted to a class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I removed the |
We discussed this some more. There were some misunderstandings about the nature of the change, but also there was a change to the default offsets in time buckets at which forecast points were requested. I reverted to the old style of defining the forecast points at "bucket time ", i.e. offset zero, in #332. We will target this and #332 together at 6.5.4. |
We were querying for the model bounds and forecast points at the beginning of each bucket. Instead we should match the time offset we apply to bucket samples when we update the model.
The upshot was that model bounds and forecasts were (typically) offset in time with respect to the data values. The problem is particularly noticeable for long bucket lengths. For example, the figures below show the model bounds for 1 day buckets before and after the fix.
Before:
After: